Learning Phone Embeddings for Word Segmentation of Child-Directed Speech

نویسندگان

  • Jianqiang Ma
  • Çağrı Çöltekin
  • Erhard Hinrichs
چکیده

This paper presents a novel model that learns and exploits embeddings of phone ngrams for word segmentation in child language acquisition. Embedding-based models are evaluated on a phonemically transcribed corpus of child-directed speech, in comparison with their symbolic counterparts using the common learning framework and features. Results show that learning embeddings significantly improves performance. We make use of extensive visualization to understand what the model has learned. We show that the learned embeddings are informative for both word segmentation and phonology in general.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech

In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of ...

متن کامل

Finding the gaps: applying a connectionist model of word segmentation to noisy phone-recognized speech data

The Christiansen model of word segmentation [1] is a connectionist framework for modeling how infants combine multiple cues in learning and processing language. Most studies applying this model assume idealized input with adult-like representations of phonemes and features, with little or no degradation of the input signal. From these studies, it is difficult to tell if the model is robust to n...

متن کامل

A statistical model for word discovery in child directed speech

A statistical model for segmentation and word discovery in child directed speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described and results of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks are also presented.

متن کامل

MAP Lexicon is Useful for Segmentation and Word Discovery in Child Directed Speech

An efficient algorithm for segmenting child-directed speech into words has recently been proposed in the Machine Learning journal. This short technical note proposes some modifications to this algorithm. In particular, a slightly more conservative variation of the original approach is proposed that infers word boundaries based simply on the maximum a-posteriori lexicon. Results of empirical tes...

متن کامل

Learning Words and Their Meanings from Unsegmented Child-directed Speech

Most work on language acquisition treats word segmentation—the identification of linguistic segments from continuous speech— and word learning—the mapping of those segments to meanings—as separate problems. These two abilities develop in parallel, however, raising the question of whether they might interact. To explore the question, we present a new Bayesian segmentation model that incorporates...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016